Please ask about problems and questions regarding this tutorial on answers.ros.org. Don't forget to include in your question the link to this page, the versions of your OS & ROS, and also add appropriate tags.

Learning templates online

Description: In this tutorial we learn how to write a program that learns and detects dominant orientation templates.

Tutorial Level: BEGINNER

Contents

Overall structure
1. The gradient_discretizer
2. The learner
The code
Compiling and Running the program

Overall structure

For this task we use two nodes, the one which we will write in the following (and which is already available in fast_template_detector/src/learners/adaptive_template_learner.cpp) and the gradient_discretizer (see fast_template_detector/src/discretizers/gradient_discretizer.cpp).

The gradient_discretizer

The gradient_discretizer subscribes for an image (/narrow_stereo/left/image_raw by default if the script fast_template_detector/run_gradient_discretize.sh is used to run it) and publishes a DiscretizedData message for each image which contains the binary data.

The learner

The learner will subscribe for an image and for DiscretizedData. The image is used for visualisation and the DiscretizedData for detecting the learned templates. The learner tries to detect the already learned templates in each frame and uses the response of the best match to decide whether a new template is learned or not. If the response is below a certain threshold a new template is learned at the position of the best match.

The code

In the following we explain the code of the learner which can be found in fast_template_detector/src/learners/adaptive_template_learner.cpp. In the following we discuss the single parts of it.

The main function

typedef ::message_filters::sync_policies::ApproximateTime< 
  ::sensor_msgs::Image, 
  ::fast_template_detector::DiscretizedData > SyncPolicy;

int
main(
  int argc,
  char ** argv )
{
  ::ros::init (argc, argv, "template_learner");
  ::ros::NodeHandle nodeHandle ("~");
      
  ::message_filters::Subscriber< ::sensor_msgs::Image > image_sub (
    nodeHandle, "/image", 20);
  ::message_filters::Subscriber< ::fast_template_detector::DiscretizedData > data_sub (
    nodeHandle, "/discretized_data", 20);
  
  ::message_filters::Synchronizer<SyncPolicy> sync (SyncPolicy(20), image_sub, data_sub);
  sync.connectInput (image_sub, data_sub);
  
  AdaptiveTemplateLearner learner (
    nodeHandle, 
    (argc > 1) ? argv[1] : "test.ftd", 
    (argc > 2) ? atoi (argv[2]) : 80, // detection threshold
    (argc > 3) ? atoi (argv[3]) : 95, // learning threshold
    (argc > 4) ? argv[4] : "raw", sync);
  
  ::ros::spin ();
}

In the main function we first initialize the ros system and create a new node handle. Then we create the subscribers for the images and the discretized data, then create a synchronizer and connect it with the subscribers, such that the incomming messages are synchronized. Finally we create the learner, where we specify the name of the file used to store the learned templates, the threshold used to decide whether a template is detected at a certain position, and the threshold used to decide whether a new template should be learned. The thresholds specify the percentage (0-100) of the maximum possible matching response.

For the synchronization an ApproximateTime synchronizer policy is used, such that messages which have approximately the same time stamp are connected together.

The constructor

class AdaptiveTemplateLearner
{

public:
  
  AdaptiveTemplateLearner (
    ros::NodeHandle & nh,
    const std::string& templateFileName, 
    const int threshold,
    const int learnThreshold,
    const std::string& transport,
    ::message_filters::Synchronizer<SyncPolicy> & sync_ )
  {
    fileName_ = templateFileName;
    
    threshold_ = threshold;
    learnThreshold_ = learnThreshold;
    
    roi_ = cvRect(0, 0, 640, 480);
    
    const int regionSize = 7;
    const int templateHorizontalSamples = 154/regionSize;
    const int templateVerticalSamples = 154/regionSize;
    const int numOfCharsPerElement = 1;
    
    templateDetector_ = new ::ftd::FastTemplateDetectorVS (
      templateHorizontalSamples, templateVerticalSamples, 
      regionSize, numOfCharsPerElement, 10);
    templateDetector_->addNewClass ();
        
    cvNamedWindow ("Image Window", CV_WINDOW_AUTOSIZE);
    mouse_.start ("Image Window");
    
    sync_.registerCallback (
      boost::bind (&AdaptiveTemplateLearner::learnCallback, this, _1, _2));    
    
    ROS_INFO("system ready");
  }

private:

  int threshold_;
  int learnThreshold_;
  
  ::ros::Subscriber dataSubscriber_;
  ::sensor_msgs::CvBridge bridge_;
  
  ::ftd::FastTemplateDetectorVS * templateDetector_;
  
  ::io::Mouse mouse_;
  
  ::std::string fileName_;
  
  CvRect roi_;
};

In the constructor we first specify the size of the template and the size of each discretization bin and use this information to setup the template detector. After this, we add a new class to the detector which corresponds to the new learned templates. For visualization and user interaction we create an OpenCV window and attach a mouse-handler to it. Finally, we register the callback-function for the learning which will be described in the next section.

The learnCallback function

In the following we explain the different parts of the function learnCallback. For the complete code of the function see fast_template_detector/src/learners/adaptive_template_learner.cpp.

First, we convert the received image into an IplImage and extract the corresponding color and gray value image. To make the edge extraction less dependend on noise we also create smoothed version of the gray value image:

    // get OpenCV image
    IplImage * image = bridge_.imgMsgToCv(msg); 
  
    // create color image
    IplImage * colorImage = util::ImageUtils::createColorImage32F (image);
    
    // convert to gray value image
    IplImage * grayImage = util::ImageUtils::createGrayImage32F (image);
        
    // smooth image
    IplImage * smoothedImage = cvCreateImage (
      cvGetSize (grayImage ), IPL_DEPTH_32F, 1);
    cvSmooth (grayImage, smoothedImage, CV_BLUR, 5);

In the next part we retrieve the current position of the mouse and visualize it in the image:

    // get current mouse position
    const int mousePosX = mouse_.getX ();
    const int mousePosY = mouse_.getY ();
    const int mouseEvent = mouse_.getEvent();
    
    
    // visualize template area
    int templateCenterX = -1;
    int templateCenterY = -1;
    if (mousePosX >= 0 && mousePosY >= 0)
    {
      templateCenterX = mousePosX;
      templateCenterY = mousePosY;              
    }
    if ( templateCenterX - templateDetector_->getTemplateWidth()/2 >= 0 ||
         templateCenterY - templateDetector_->getTemplateHeight()/2 >= 0 ||
         templateCenterX + templateDetector_->getTemplateWidth()/2 
           <= grayImage->width-1 ||
         templateCenterY + templateDetector_->getTemplateHeight()/2 
           <= grayImage->height-1 )
    {
      CvPoint pt1;
      CvPoint pt2;
      CvPoint pt3;
      CvPoint pt4;

      pt1.x = templateCenterX-templateDetector_->getTemplateWidth()/2;
      pt1.y = templateCenterY-templateDetector_->getTemplateHeight()/2;
      pt2.x = templateCenterX+templateDetector_->getTemplateWidth()/2;
      pt2.y = templateCenterY-templateDetector_->getTemplateHeight()/2;
      pt3.x = templateCenterX+templateDetector_->getTemplateWidth()/2;
      pt3.y = templateCenterY+templateDetector_->getTemplateHeight()/2;
      pt4.x = templateCenterX-templateDetector_->getTemplateWidth()/2;
      pt4.y = templateCenterY+templateDetector_->getTemplateHeight()/2;

      //if (showRectangle)
      {
        cvLine(colorImage,pt1,pt2,CV_RGB(0,0,0),3);
        cvLine(colorImage,pt2,pt3,CV_RGB(0,0,0),3);
        cvLine(colorImage,pt3,pt4,CV_RGB(0,0,0),3);
        cvLine(colorImage,pt4,pt1,CV_RGB(0,0,0),3);

        cvLine(colorImage,pt1,pt2,CV_RGB(255,255,0),1);
        cvLine(colorImage,pt2,pt3,CV_RGB(255,255,0),1);
        cvLine(colorImage,pt3,pt4,CV_RGB(255,255,0),1);
        cvLine(colorImage,pt4,pt1,CV_RGB(255,255,0),1);
      }
    }

If the right mouse button is pressed ('mouseEvent == 2') then we learn a new template at the position of the mouse pointer. For this, we first copy the data within the selected template region into a separate array and pass this array to the template detector, which stores it as new template. For visualisation purposes we also supply the template detector with the smoothed image such that it can compute the corresponding conturs, which can be later visualised at detected positions. Finally, we re-cluster the templates in order to speed up the detection process:

    // create template if right mouse button has been pressed
    if (templateCenterX != -1 && templateCenterY != -1 && mouseEvent == 2)
    {
      const int regionWidth = templateDetector_->getSamplingSize ();
      const int regionHeight = templateDetector_->getSamplingSize ();
      const int templateHorizontalSamples 
        = templateDetector_->getTemplateWidth () 
        / templateDetector_->getSamplingSize ();
      const int templateVerticalSamples = templateDetector_->getTemplateHeight () 
        / templateDetector_->getSamplingSize ();
      unsigned char * templateData = new unsigned char[
        templateHorizontalSamples * templateVerticalSamples];

      const int horizontalSamples = imageWidth 
        / templateDetector_->getSamplingSize ();
      const int verticalSamples = imageHeight 
        / templateDetector_->getSamplingSize ();


      // copy data
      for (int rowIndex = 0; rowIndex < templateVerticalSamples; ++rowIndex)
      {
        for (int colIndex = 0; colIndex < templateHorizontalSamples; ++colIndex)
        {
          const int tmpColIndex = colIndex + (templateCenterX 
            - templateDetector_->getTemplateWidth ()/2)/regionWidth;
          const int tmpRowIndex = rowIndex + (templateCenterY
            - templateDetector_->getTemplateHeight ()/2)/regionHeight;
            
          templateData[rowIndex*templateHorizontalSamples + colIndex] 
            = dataMsg->data[tmpRowIndex*horizontalSamples + tmpColIndex];
            
        }
      }
      
      
      // add template
      const int templateWidth = templateDetector_->getTemplateWidth ();
      const int templateHeight = templateDetector_->getTemplateHeight ();
        
      const int templateStartX = templateCenterX-templateWidth/2;
      const int templateStartY = templateCenterY-templateHeight/2;
        
      templateDetector_->addNewTemplate (templateData, 0, maxResponse, 
        cvRect (templateStartX, templateStartY, templateWidth, templateHeight));
      
      cvSetImageROI (smoothedImage, 
        cvRect (templateStartX, templateStartY,
          templateDetector_->getTemplateWidth (), 
          templateDetector_->getTemplateHeight ()));
      templateDetector_->addContour (smoothedImage);
      cvResetImageROI (smoothedImage);
      
      ROS_INFO ("template added");
      
      delete[] templateData;
      
      templateDetector_->clearClusters ();
      templateDetector_->clusterHeuristically (4);
    }

As soon as at least one template is learned using an user input, we try to detect it in the current image data. For this, we first copy the received discretized data in a new array of type 'unsigned char' and then supply this to the detector. Using Using this data the detector tries to find possible instances of the learned templates and returns this detection candidates as a list:

    // copy data
    const int horizontalSamples = dataMsg->horizontalSamples;
    const int verticalSamples = dataMsg->verticalSamples;
    
    unsigned char * data = new unsigned char[horizontalSamples*verticalSamples];
    
    int elementIndex = 0;
    for (std::vector<unsigned char>::const_iterator iter 
      = dataMsg->data.begin (); 
      iter != dataMsg->data.end (); 
      ++iter)
    {
      data[elementIndex] = *iter;
      
      ++elementIndex;
    }
    

    // detect templates
    const int threshold = threshold_;
    std::list< ::ftd::Candidate* > * candidateList = templateDetector_->process(
      data, threshold, dataMsg->horizontalSamples, dataMsg->verticalSamples);

Having a list of candidates we first visualise the detected candidates, find the best of them and, if the response value of this candidate is below the learning threshold, add a new template corresponding to the position of the detected candidate. The adding of the new template is similar to the adding using the mouse:

    if (candidateList != NULL)
    {
      for (int classIndex = 0; 
        classIndex < templateDetector_->getNumOfClasses(); 
        ++classIndex)
      {         
        // visualise candidates 
        for (std::list< ::ftd::Candidate* >::iterator candidateIter 
          = candidateList[classIndex].begin(); 
          candidateIter != candidateList[classIndex].end(); 
          ++candidateIter)
        {
          const int regionStartX = (*candidateIter)->getCol ();
          const int regionStartY = (*candidateIter)->getRow ();
          const int regionWidth = templateDetector_->getTemplateWidth();
          const int regionHeight = templateDetector_->getTemplateHeight();

          cvLine (
            colorImage,
            cvPoint (regionStartX, regionStartY),
            cvPoint (regionStartX+regionWidth, regionStartY),
            CV_RGB (0, 0, 255), 2 );
          cvLine (
            colorImage,
            cvPoint (regionStartX+regionWidth, regionStartY),
            cvPoint (regionStartX+regionWidth, regionStartY+regionHeight),
            CV_RGB (0, 0, 255), 2 );
          cvLine (
            colorImage,
            cvPoint (regionStartX+regionWidth, regionStartY+regionHeight),
            cvPoint (regionStartX, regionStartY+regionHeight),
            CV_RGB (0, 0, 255), 2 );
          cvLine (
            colorImage,
            cvPoint (regionStartX, regionStartY+regionHeight),
            cvPoint (regionStartX, regionStartY),
            CV_RGB (0, 0, 255), 2 );            
        }
                          
        // find best candidate
        int bestCandidateResponse = 0;
        std::list< ::ftd::Candidate* >::iterator bestCandidateIter 
          = candidateList[classIndex].end();
        for (std::list< ::ftd::Candidate* >::iterator candidateIter 
          = candidateList[classIndex].begin(); 
          candidateIter != candidateList[classIndex].end(); 
          ++candidateIter)
        {
          if ((*candidateIter)->getMatchingResponse () > bestCandidateResponse)
          {
            bestCandidateResponse = (*candidateIter)->getMatchingResponse ();
            bestCandidateIter = candidateIter;
          }
        }
                          
        if (bestCandidateIter != candidateList[classIndex].end())
        {
          const int regionStartX = (*bestCandidateIter)->getCol ();
          const int regionStartY = (*bestCandidateIter)->getRow ();
          const int regionWidth = templateDetector_->getTemplateWidth();
          const int regionHeight = templateDetector_->getTemplateHeight();
          
          cvLine (
            colorImage,
            cvPoint (regionStartX, regionStartY),
            cvPoint (regionStartX+regionWidth, regionStartY),
            CV_RGB (0, 255, 0), 2 );
          cvLine (
            colorImage,
            cvPoint (regionStartX+regionWidth, regionStartY),
            cvPoint (regionStartX+regionWidth, regionStartY+regionHeight),
            CV_RGB (0, 255, 0), 2 );
          cvLine (
            colorImage,
            cvPoint (regionStartX+regionWidth, regionStartY+regionHeight),
            cvPoint (regionStartX, regionStartY+regionHeight),
            CV_RGB (0, 255, 0), 2 );
          cvLine (
            colorImage,
            cvPoint (regionStartX, regionStartY+regionHeight),
            cvPoint (regionStartX, regionStartY),
            CV_RGB (0, 255, 0), 2 );
        }
                          
                          
        // if best response is below learning threshold then 
        // learn a new template at this position
        if ( bestCandidateIter != candidateList[classIndex].end()
          && bestCandidateResponse < learnThreshold_ )
        {
          const int startX = (*bestCandidateIter)->getCol ();
          const int startY = (*bestCandidateIter)->getRow ();

          const int regionWidth = templateDetector_->getSamplingSize ();
          const int regionHeight = templateDetector_->getSamplingSize ();
          const int templateHorizontalSamples 
            = templateDetector_->getTemplateWidth () 
            / templateDetector_->getSamplingSize ();
          const int templateVerticalSamples 
            = templateDetector_->getTemplateHeight () 
            / templateDetector_->getSamplingSize ();
          unsigned char * templateData = new unsigned char[
            templateHorizontalSamples * templateVerticalSamples];

          const int horizontalSamples = imageWidth 
            / templateDetector_->getSamplingSize ();


          // copy data
          int maxResponse = 0;
          for (int rowIndex = 0; rowIndex < templateVerticalSamples; ++rowIndex)
          {
            for (int colIndex = 0; colIndex < templateHorizontalSamples; ++colIndex)
            {
              const int tmpColIndex = colIndex + startX/regionWidth;
              const int tmpRowIndex = rowIndex + startY/regionHeight;
              
              templateData[rowIndex*templateHorizontalSamples + colIndex] 
                = dataMsg->data[tmpRowIndex*horizontalSamples + tmpColIndex];

              if (dataMsg->data[tmpRowIndex*horizontalSamples + tmpColIndex] != 0)
              {
                ++maxResponse;
              }
            }
          }
          
          
          // add template
          const int templateWidth = templateDetector_->getTemplateWidth ();
          const int templateHeight = templateDetector_->getTemplateHeight ();
          
          const int templateStartX = templateCenterX-templateWidth/2;
          const int templateStartY = templateCenterY-templateHeight/2;
          
          templateDetector_->addNewTemplate (templateData, 0, maxResponse, 
            cvRect (templateStartX, templateStartY, templateWidth, templateHeight));
        
          cvSetImageROI (smoothedImage, 
            cvRect (templateStartX, templateStartY, 
              templateDetector_->getTemplateWidth (), 
              templateDetector_->getTemplateHeight ()));
          templateDetector_->addContour (smoothedImage);
          cvResetImageROI (smoothedImage);


          ROS_INFO ("template added");
          
          delete[] templateData;
          
          templateDetector_->clearClusters ();
          templateDetector_->clusterHeuristically (4);
        }
      }

      // free memory
      for (int classIndex = 0; 
        classIndex < templateDetector_->getNumOfClasses(); 
        ++classIndex)
      {
        ::ftd::emptyPointerList(candidateList[classIndex]);
      }
    }

Compiling and Running the program

As stated before the source code of the presented program is already available in the SVN (see fast_template_detector/src/learners/adaptive_template_learner.cpp). Therefore you only have to go to the fast_template_detector directory and run 'rosmake' to compile it:

$ rosmake

To run the program we need two consoles. In the first one we have to run the discretizer, e.g. a gradient discretizer:

$ ./run_gradient_discretizer.sh

In the second console we run the learner:

$ ./run_adaptive_template_learner.sh

The above scripts use the image data published as '/narrow_stereo/left/image_raw' for processing. If you want to use different image data you have to change the scripts.

ROS 2 Documentation

Wiki

Page

User